{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Transformers installation\n",
    "! pip install transformers datasets evaluate accelerate\n",
    "# To install from source instead of the last release, comment the command above and uncomment the following one.\n",
    "# ! pip install git+https://github.com/huggingface/transformers.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Image-to-Image Task Guide"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Image-to-Image task is the task where an application receives an image and outputs another image. This has various subtasks, including image enhancement (super resolution, low light enhancement, deraining and so on), image inpainting, and more.\n",
    "\n",
    "This guide will show you how to:\n",
    "\n",
    "- Use an image-to-image pipeline for super resolution task,\n",
    "- Run image-to-image models for same task without a pipeline.\n",
    "\n",
    "Note that as of the time this guide is released, `image-to-image` pipeline only supports super resolution task.\n",
    "\n",
    "Let's begin by installing the necessary libraries.\n",
    "\n",
    "```bash\n",
    "pip install transformers\n",
    "```\n",
    "\n",
    "We can now initialize the pipeline with a [Swin2SR model](https://huggingface.co/caidas/swin2SR-lightweight-x2-64). We can then infer with the pipeline by calling it with an image. As of now, only [Swin2SR models](https://huggingface.co/models?sort=trending&search=swin2sr) are supported in this pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "from accelerate import Accelerator\n",
    "import torch\n",
    "# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)\n",
    "device = Accelerator().device\n",
    "pipe = pipeline(task=\"image-to-image\", model=\"caidas/swin2SR-lightweight-x2-64\", device=device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's load an image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "import requests\n",
    "\n",
    "url = \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/cat.jpg\"\n",
    "image = Image.open(requests.get(url, stream=True).raw)\n",
    "\n",
    "print(image.size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```bash\n",
    "# (532, 432)\n",
    "```\n",
    "\n",
    "<div class=\"flex justify-center\">\n",
    "     <img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/cat.jpg\" alt=\"Photo of a cat\"/>\n",
    "</div>\n",
    "\n",
    "We can now do inference with the pipeline. We will get an upscaled version of the cat image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "upscaled = pipe(image)\n",
    "print(upscaled.size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```bash\n",
    "# (1072, 880)\n",
    "```\n",
    "\n",
    "If you wish to do inference yourself with no pipeline, you can use the `Swin2SRForImageSuperResolution` and `Swin2SRImageProcessor` classes of transformers. We will use the same model checkpoint for this. Let's initialize the model and the processor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import Swin2SRForImageSuperResolution, Swin2SRImageProcessor \n",
    "\n",
    "model = Swin2SRForImageSuperResolution.from_pretrained(\"caidas/swin2SR-lightweight-x2-64\").to(device)\n",
    "processor = Swin2SRImageProcessor(\"caidas/swin2SR-lightweight-x2-64\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pipeline` abstracts away the preprocessing and postprocessing steps that we have to do ourselves, so let's preprocess the image. We will pass the image to the processor and then move the pixel values to GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pixel_values = processor(image, return_tensors=\"pt\").pixel_values\n",
    "print(pixel_values.shape)\n",
    "\n",
    "pixel_values = pixel_values.to(device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now infer the image by passing pixel values to the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "with torch.no_grad():\n",
    "  outputs = model(pixel_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Output is an object of type `ImageSuperResolutionOutput` that looks like below 👇\n",
    "\n",
    "```text\n",
    "(loss=None, reconstruction=tensor([[[[0.8270, 0.8269, 0.8275,  ..., 0.7463, 0.7446, 0.7453],\n",
    "          [0.8287, 0.8278, 0.8283,  ..., 0.7451, 0.7448, 0.7457],\n",
    "          [0.8280, 0.8273, 0.8269,  ..., 0.7447, 0.7446, 0.7452],\n",
    "          ...,\n",
    "          [0.5923, 0.5933, 0.5924,  ..., 0.0697, 0.0695, 0.0706],\n",
    "          [0.5926, 0.5932, 0.5926,  ..., 0.0673, 0.0687, 0.0705],\n",
    "          [0.5927, 0.5914, 0.5922,  ..., 0.0664, 0.0694, 0.0718]]]],\n",
    "       device='cuda:0'), hidden_states=None, attentions=None)\n",
    "```\n",
    "\n",
    "We need to get the `reconstruction` and post-process it for visualization. Let's see how it looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "outputs.reconstruction.data.shape\n",
    "# torch.Size([1, 3, 880, 1072])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to squeeze the output and get rid of axis 0, clip the values, then convert it to be numpy float. Then we will arrange axes to have the shape [1072, 880], and finally, bring the output back to range [0, 255]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# squeeze, take to CPU and clip the values\n",
    "output = outputs.reconstruction.data.squeeze().cpu().clamp_(0, 1).numpy()\n",
    "# rearrange the axes\n",
    "output = np.moveaxis(output, source=0, destination=-1)\n",
    "# bring values back to pixel values range\n",
    "output = (output * 255.0).round().astype(np.uint8)\n",
    "Image.fromarray(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"flex justify-center\">\n",
    "     <img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/cat_upscaled.png\" alt=\"Upscaled photo of a cat\"/>\n",
    "</div>"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}
